The Effect of Alternative Scaling Approaches on the Performance of Different Supervised Learning Algorithms. An Empirical Study in the Case of Credit Scoring
نویسندگان
چکیده
Building classification tools to discriminate between good and bad credit risks is a supervised learning task that can be solved using different approaches. In constructing such tools, generally, a set of training data containing qualitative and quantitative attributes is used to learn the discriminant rules. In real world of credit applications a lot of the available information about the customer and his payment behavior appears in qualitative categorical attributes. On the other hand many approaches of supervised learning require quantitative numerical input attributes. Qualitative attributes first have to be transformed in numerical, before they can be used for the learning process. One very simple approach to handle that problem is to code each possible value of all qualitative categorical attributes in new separate binary attributes. This leads to an increasing number of input attributes, that makes learning more complicated and less reliable. In particular neural networks need more time for training and often loose accuracy. In this paper we consider different scaling approaches here the number of attributes does not increase to transform categorical into numerical attributes. We use them as input variables to learn the discriminant rules in order to enhance accuracy and stability of the rules. Using real world credit data. we evaluate different approaches and compare the results.
منابع مشابه
Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPersonal Credit Score Prediction using Data Mining Algorithms (Case Study: Bank Customers)
Knowledge and information extraction from data is an age-old concept in scientific studies. In industrial decision-making processes, the application of this concept gives rise to data-mining opportunities. Personal credit scoring is an ever-vital tool for banking systems in order to manage and minimize the inherent risks of the financial sector, thus, the design and improvement of credit scorin...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملیادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کامل